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The 5'-most gene, gene 1, of the genome of murine coronavirus, mouse hepatitis virus (MHV), is presumed to encode 
the viral RNA-dependent RNA polymerase. We have determined the complete sequence of this gene of the JHM strain 
by cDNA cloning and sequencing. The total length of this gene is 21,798 nucleotides long, which includes two overlap¬ 
ping, large open reading frames. The first open reading frame, ORF la, is 4488 amino acids long. The second open 
reading frame, ORF 1 b, overlaps ORF la for 75 nucleotides, and is 2731 amino acids long. The overlapping region may 
fold into a pseudoknot RNA structure, similar to the corresponding region of the RNA of avian coronavirus, infectious 
bronchitis virus (IBV). The in vitro transcription and translation studies of this region indicated that these two ORFs 
were most likely translated into one polyprotein by a ribosomal frameshifting mechanism. Thus, the predicted molecu¬ 
lar weight of the gene 1 product is more than 800,000 Da. The sequence of ORF 1 b is very similar to the corresponding 
ORF of IBV. In contrast, the ORF la of these two viruses differ in size and have a high degree of divergence. The amino 
acid sequence analysis suggested that ORF 1 a contains several functional domains, including two hydrophobic, mem¬ 
brane-anchoring domains, and three cysteine-rich domains. It also contains a picornaviral 3C-like protease domain and 
two papain-like protease domains. The presence of these protease domains suggests that the polyprotein is most likely 
processed into multiple protein products. In contrast, the ORF 1b contains polymerase, helicase, and zinc-finger 
motifs. These sequence studies suggested that the MHV gene 1 product is involved in RNA synthesis, and that this 
product is processed autoproteolytically after translation. This study completes the sequence of the MHV genome, 
which is 31 kb long, and constitutes the largest viral RNA known. © 1991 Academic Press, inc. 


INTRODUCTION 

Mouse hepatitis virus (MHV), a murine coronavirus, 
contains a single-stranded, positive-sense RNA ge¬ 
nome (Lai and Stohlman, 1978; Wege etal., 1978). The 
genomic organization is well understood (Spaan etal., 
1988; Lai, 1990). It contains 8 genes, each of which is 
expressed from the 5'-end of a polycistronic mRNA 
species. These mRNAs have a 3'-coterminal, nested- 
set structure (Lai etal., 1981). Starting from the 5'-end 
of the genome, the genes are named 1, 2a, 2b, 3, and 
so on until gene 7 (Cavanagh etal., 1990). Genes 2b, 3, 
6, and 7 encode the four known viral structural pro¬ 
teins, i.e., HE (hemagglutinin-esterase), S (spike), M 
(membrane), and N (nucleocapsid) proteins, respec¬ 
tively. The remaining genes presumably encode non- 
structural proteins, most of which are yet to be identi¬ 
fied in the virus-infected cells. The nucleotide se¬ 
quences of genes 2 to 7 have been determined for two 
strains, A59 and JHM, of MHV (Armstrong etal., 1983, 
1 984; Skinner et at., 1 985; Skinner and Siddell, 1 983, 

Sequence data from this article have been deposited with the 
EMBL/GenBank under Accession No. M55148. 
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1985; Schmidt etal., 1987; Luytjes etal., 1987, 1988; 
Shieh et ai, 1989). Altogether these seven genes ac¬ 
count for roughly 9.5 kb. The remaining gene, gene 1, 
which is the 5'-most gene, has been estimated to be 
longer than the size of all of the other genes combined 
(Pachuk etal., 1989; Bakerefa/., 1 990). Only the 5'-ter- 
minal 5.3 kb in JHM strain and the 3'-terminal 8.4 kb of 
this gene in A59 strain have so far been sequenced 
(Soe et a/., 1987; Baker et at., 1989; Pachuk et ah, 
1989; Bredenbeek et ai, 1990). The corresponding 
gene of an avian coronavirus, infectious bronchitis 
virus (IBV), has been completely sequenced and 
shown to be 20 kb long (Boursnell et at., 1987). This 
IBV gene consists of two open reading frames (ORFs), 
which can be translated into a polyprotein via a ribo¬ 
somal frameshifting mechanism (Brierley et ai, 1987, 

1989) . Again, the gene products have yet to be de¬ 
tected in the virus-infected cells. The size of MHV gene 
1 has not been determined. From the approximate 
sizes of the cDNA clones, it has been estimated to be 
roughly 22-23 kb (Pachuk et a!., 1989; Baker et a!., 

1990) . Comparison of the published partial sequences 
of gene 1 showed that IBV and MHV share sequence 
similarity in the 3'-terminus of the gene (Bredenbeek et 
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Fig. 1. Molecular clones and restriction map of the gene 1 of the genomic RNA of MHV-JHM. (a) Schematic diagram of the MHV-JHM genome 
and restriction map of the cDNA clones, (b) The cDNA clones used for sequencing. Abbreviations: B, BamHI; E, fcoRI; H, Hin6\\\\ K, Kpn\\ N, 
A/col; P, Psfl. Lengths are expressed in kilobase pairs. 


a/., 1990), and yet their 5'-ends are diverged (Soe eta!., 
1987; Baker et al., 1989). Thus, the evolutionary rela¬ 
tionship of these two viruses in gene 1 is not clear. 

Several pieces of evidence suggest that gene 1 may 
encode proteins which are directly involved in viral 
RNA synthesis: First, since MHV does not contain RNA 
polymerase (Brayton et al., 1982), this enzyme has to 
be synthesized from the incoming virion genomic RNA. 
This translation is only possible if the gene is located at 
the 5'-end of the genome. Second, RNA recombination 
studies using temperature-sensitive (ts) mutants indi¬ 
cated that the ts lesions affecting RNA synthesis are 
localized within the gene 1 region (Keck et al., 1987). 
This conclusion has been confirmed by RNA recombi¬ 
nation mapping studies (Baric et al., 1990). Third, the 
3'-half of the gene 1 sequences of IBV and MHV-A59 
contains the sequence motifs for RNA polymerase and 
helicase, which are the activities expected to be in¬ 
volved in RNA synthesis (Boursnell etai, 1987; Gorba- 
lenyaefa/., 1989b; Bredenbeekera/., 1990). However, 
these postulated functions have not been directly dem¬ 
onstrated. At least one enzymatic activity, i.e., an auto¬ 
protease (Baker etai, 1989), has been associated with 
the gene product. The presence of the protease activ¬ 
ity suggests that the gene 1 product is likely to be pro¬ 
cessed into multiple proteins. 


The properties of the RNA polymerase of corona- 
virus are of considerable interest since the coronavirus 
RNA synthesis utilizes an unusual mechanism of dis¬ 
continuous transcription, probably involving a free 
leader RNA species (Lai, 1988). The understanding of 
the RNA polymerase should shed further light on the 
mechanism of RNA synthesis. To this end, we have 
obtained the complete sequence of gene 1 of the JHM 
strain of MHV. This gene is nearly 22,000 nucleotides 
long and contains two overlapping ORFs, similarto the 
corresponding IBV gene. Sequence analysis shows 
that the MHV gene may have undergone extensive di¬ 
vergence from the IBV gene, particularly at its 5'-half. 
Several functional domains were identified, which may 
be important for the processing and the enzymatic ac¬ 
tivities of its gene product. 

MATERIALS AND METHODS 

Virus and cells. The plaque-cloned JHM strain of 
MHV (Makino et al., 1984) was used throughout this 
study. The virus was propagated on DBT cells (Hirano 
et al., 1974) at m.o.i. of 1. Virus was harvested and 
purified from the medium, and viral RNA was prepared 
as previously described (Makino et al., 1984). 
cDNA cloning. The cDNA clones encompassing 
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Fig. 2. Flydropathy profiles of the predicted amino acid sequences of ORF 1 a and ORF 1 b. Values above the line are hydrophobic and values 
below the line are hydrophilic. The hydropathicity was calculated using a moving window of 40 amino acids, with a value plotted every 16 
residues (Kyte and Doolittle, 1982). 

gene 1 were obtained by using specific synthetic oligo- gonucleotides were derived from RNA sequence analy- 
nucleotides as primers and purified virion genomic sis of the RNase T1 -resistant oligonucleotides which 

RNA as template. Initially, the sequences of these oli- had been mapped to either gene 1 or 2 (Shieh et at., 



Nucleotide number 


Fig. 3. Diagram of the codon preference in the region between ORF 1 a and ORF 1 b. The codon usage patterns for the three reading frames of 
the predicted amino acid sequences at the junction between the ORF la and ORF 1b are shown. The two stop codons at 13600 (TAG) and 
13679 (TAA) are marked. The codon usage table was generated for genes 3, 6, and 7, which encode the viral structural proteins, of MF1V-JFIM 
(Schmidt etal., 1987; Skinner and Siddell, 1983), and used for comparison with ORFs 1 a and 1 b. The parameters used are a window length of 25 
and a maximum scale of 1.1 (Gribskov et a/., 1984). 
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IBV-M 42 5' 12337 G AU A AG A AUU AUUU A A ACGGGU ACGGGGU AGC AGUG-AGGCUCGGCUGAUACCCCUUGCUAGUGG 3' 

MUW IUM 11 i 11 iii iii 11min milli mi i n n it~iiiii hi iiiii 

MHV-JHM 5' 13643 G AC A C G A AUUUUUU A A ACGGGU UCGGGGU AC A AGUGU A A AUGCCCGUCUUGU ACCCUGUGCC AGUGG 3' 

MUW Acn llllllll I I II I I I I I I I I III III I I I I I II I TTTIII III III III I III I III III III I I 

MHV-A 59 5' 284 G AC A C G A ACUUUUU A A ACGG AUUCGGGGU AC A AGUGU A A AUGCCCGUCUUGU ACCCUGUGCC AGUGG 3' 
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Fig. 4. Comparison of the RNA sequences and the proposed secondary structure of the MHV-JHM, MHV-A59 and IBV RNAs at the junction 
between ORF la and ORF 1 b. (A) Alignment of nucleotide sequences. The first nucleotides are numbered according to Boursnell etal. (1987), for 
IBV, and Bredenbeek etal. (1990), for MHV-A59, and termination codons are underlined. (B) Tertiary RNA structure at the region of ribosomal 
frameshifting. The potential signal for ribosomal frameshifting is boxed, and the stop codon is underlined. Arrows indicate the differences in the 
RNA sequence of MHV-JHM in comparison with that of IBV (boldfaced) and MHV-A59 (outlined). 


1987, 1989; Soe et at., 1987). cDNA synthesis was 
performed by the general method of Gubler and Hoff¬ 
man (1983). The double-stranded cDNA molecules 


were trimmed with T4 DNA polymerase and ligated to 
pTZ18U (United States Biochemical Corp.) either by 
blunt-end ligation or EcoRl linker ligation. The recombi- 



Fig. 5. SDS-PAGE analysis otin vitro translated products. (A) Diagram of the plasmids used and the predicted sizes of the translation products 
from the transcribed RNAs. (B) Plasmid pTZ(FrSh) was linearized with either /-7/nd 111 (lanes 2, 5, and 8), generating a full-length transcript by T7 
RNA polymerase, or with Dra\ (lanes 1,4, and 7), generating a 0.5-kb RNA. Translation was performed in a rabbit reticulocyte lysate system using 
[ 35 S]methionine. Translation products were analyzed directly (lanes 1-3) or after immunoprecipitation using the ORF la-specific antiserum 
(lanes 4-6) or rabbit preimmune serum (lanes 7-9). M indicates molecular weight markers in kilodaltons; lanes 3, 6, and 9, translation of 
pTZ(ORF OU9 ). 
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tional cDNA cloning to obtain overlapping cDNA 
clones. 

DNA sequencing. Sequencing was performed as 
previously described (Shieh et at., 1987, 1989). Both 
chemical modification (Maxam and Gilbert, 1980) and 
dideoxynucleotide chain termination (Sanger et at., 
1977) methods were used directly on plasmid DNA 
(Chen and Seeburg, 1985). 

Construction of recombinant plasmids for the frame- 
shifting analysis. Subcloning and mutagenesis of 
cDNA clone T-12 was accomplished using synthetic 
oligonucleotides and polymerase chain reaction (PCR). 
Briefly, oligomer #166 (5'-GATCGAATTCCTTTACAT- 
GGTGAAGGGGTG-3'), which extends from nucleotide 
13,147 to 13,167 of gene 1 and contains mismatches 
at both nucleotides 13,154 and 13,156, and oligomer 

#199 (5'-CATATGACACAGGATCCTTTATGCC-3'), 

which is complementary to nucleotides 13,529 to 
13,553 and includes the BamHl site at nucleotide 
13,537, were used for DNA amplification by PCR ac¬ 
cording to the standard procedures (Saiki et at., 1988). 
The resulting PCR DNA product encompasses se¬ 
quences from nucleotide 13,147 to 13,537 with a spe¬ 
cific mutation (T to A) at nucleotide 13,154 and another 
(T to G) at nucleotide 13,156, resulting in the introduc¬ 
tion of an ATG codon. The DNA was then digested with 
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Fig. 6. Dot matrix comparison of the predicted amino acid se¬ 
quences of ORF la and ORF 1b of MFIV-JHM and IBV. The profiles 
were generated by the compare/word option from the Genetics 
Computer Group program (Devereux et al., 1984) with a word-size of 
2 and alphabet of 20 for ORF la (a) and 21 for ORF 1b (b). 


nant DNAs were transformed into Escherichia coli 
strain MV1190 competent cells (Dagert and Ehrlich, 
1979). Homopolymer dC tailing to the 3'-end of the 
cDNAs using terminal transferase were also used to 
anneal to Psrl-linearized pBR322 with oligo(dG) tails 
and transformed into E. coli strain MCI061. Specific 
cDNA clones were identified using 5'-end-labeled oligo¬ 
nucleotides as probes and confirmed by subsequent 
hybridization to viral mRNA (Shieh era/., 1987). Once 
the sequences of the cDNA clones were obtained, oli¬ 
gonucleotides complementary to the 5'-ends of these 
clones were synthesized to serve as primers for addi- 



Fig. 7, Comparison of the sequence and structure of the putative 
metal-binding domain of ORF lb from MFIV-JFIM and IBV. (a) Align¬ 
ment of amino acid sequences. The amino acid residues are num¬ 
bered with respect to ORF 1 b. Asterisks indicate the conserved Cys 
and FHis residues. Arrows show the putative cleavage sites for the 
3C-like proteases. The open triangles indicate the residues puta¬ 
tively liganded with the metal ion in the case of IBV (Gorbalenya etat. 
1989b). These amino acids are substituted in MFIV, but neighboring 
residues preserve the metal-binding domain, (b) Predicted structure 
of the metal-binding domain of MFIV-JFIM ORF 1b. M, metal cation 
(Zn 2+ ). Only one of the several possible foldings of this domain is 
shown. 
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MHVA (547-1020) LLENVDLFVKRRAEFACKFATCGDGLVPLLLD-GLVPRSY-YL—IKSGQA—FTSLM 

IBVF1 (199- 677) IFENVNELPQRIAALKMAFAKCARSITVVVVERTLVVKEFAGTCLASINGAVAKFFEELP 
MHVA VNF—SREVVDMC—MDMALLFMHDVKVATKYVKKVTGKVAVRFKALGIAVVRKITEWFDLAVDTAASAA 

IBVF1 NGFMGSKIFTTLAFFKEAAVRWENIPNAPRGIKGFEWGNAKGTQVVVRLMRNDLTLLDQKADIPVEPE 
MHVA GWLCYQLVNGL-FAVANGVITFIQEVPNYQEFINNQHFNSHLHPPELVKNFVDKFKTFFKVLIDSMSVSI 

IBVF1 GW-SAILDGHLCYVFRSGDRFYAAPLSGNFALSDVHCCERVVCLSDGVTPEIND-GLILAAIYSSFSVSE 
MHVA LSGLTVVKTASNRVCLAGSKVYEVVQKSLPAYIMPVGCSEATCLVGEIEPAVFEDDV-VDVVKAPLTY-Q 
I BVF 1 L--VTALKKGEPFKFLGHKFVY—AKDAAVSFTLAKAATI ADVLRLFQSARVIAEDVWSSFTEKSFEFWK 
MHVA GCCKPPSSFEKICIVDKLYMAKCGDQFYPVVVDNDTVGVLDQ-CWRFPCAGKKVV-FNDKPKVKEVPSTR 

IBVF1 LAYGKVRNLEEF-VKTYVCKAQMSIVILAAVLGEDIWHLVSQVIYKLGVLFTKVVDFCDKHWKGFCVQLK 
MHVA KIK11FALDATFDSVLSKACSEFEVDKDVTL-DELLDVVLDAVESTLSPCKEHGVIGTKVCALLKGWWTI 

IBVF1 RAKLIVTETFCVLKGVAQHCFQLLLDAIHSLYKSFKKCALGRIHGDLLFWK—GGVHKIVQDGDEIWFDA 
MHVA MSIFLMKEAKKLLPSRMYVLSAPDEDCVATDVVYADENQDDDADDPVVLVADTQEEDGVAREQVDSADSE 

IBVF1 IDS-VDVEDLGVVQEKSIDFEVCDDVTLPENQPGHMVQIEDDGKNYMFFRFKKDENIYYTPMSQLGAINV 
MHVA ICVAHTGGQEMT 319 residues 

IBVF1 VC—KAGGKTVT 346 residues 

MHVA (1340-1501) VCFVKGDVI—KVLRRVGAEVIVNPANGRMAHGAGVAGAIAKAAGKAFINETADMVKA 
IBVF1 (1018-1183) TCVGDLTVVIAKALDEFKEFCIVNAANEHMTHGSGVAKAIADFCGLDFVEYCEDYVKK 
MHVA QGVCQVGGCYESTGGKLCKKVLNIVGPDARGHGNECYSLLERAYQH—INKCDNVVTTLISAGIFSVPTD 
I BVF 1 HGPQQRLVTPSF VKGIQC- - VNNVVGPR-HGDNNLHEKL VA-AYKNVL VDGVVN YVVP VLSLGIFGVDFK 

MHVA VSLTYL-LGVVTKNVILVSNNQDDFDVIE-KC-QVTSVAGT 132 residues 

IBVF1 MSIDAMREAFEGCTIRVLLFSLSQEHIDYFDVTCKQKTIYLTE 0 residues 

MHVA (1634-2058) DGVNFRSCCVAEGEVFGKTLGSVFCDGINVTKVRCSAIHKGKVFFQYSGLSAADLAAV 

IBVF1 (1184-1597) DGVKYRSIVLKPGDSLGQ-FGQVYAKNKIVFTA—DDVEDKEILY-VPTTD-KSI 

* 

MHVA KDAFGFDEPQLLQYYSMLGMCKWPVVVCGNYFAFKQSNNNCYINVACLMLQHLSLKFPKWQWRRPGNEFR 
I BVF1 LEYYGLDAQKYVIYLQTLAQ-KWNVQYRDNFLILEWRDGNCWISSAIVLLQAAKIRFKGF-LTEAWAKLL 

MHVA SGKPLRFVSLVLAKGSFKFNEPSDSTDFIRVELR—EADLRSATCDLEFICKCGVKQEQRKGVDA-VMHF 

IBVF1 GGDPTDFVAWCYASCTAKVGDFSDANWLLANLAEHFDADYTNAFLKKRVSCNCGIKSYELRGLEACIQPV 

Fig. 8. Alignment of the ORF la of MHV-JFIM and IBV. The overall alignment was generated by combining segments aligned by programs 
0PTAL (Gorbalenya et at., 1989a) and MULTALIN (Corpet, 1988). It consists of four distinct pieces separated by regions that could not be 
aligned with certainty. For the latter regions, only the total numbers of amino acid residues are indicated. The amino acid numbers of the first and 
the last residues of each aligned segment are indicated in parentheses. Two dots, identical residues; single dots, similar residues. Conserved 
Cys residues are highlighted by boldface. Asterisks, putative catalytic residues of proteases; arrows, putative cleavage sites for 3C-like pro¬ 
teases. Box, the putative cleavage site for 3CL pr ° in IBV substituted by a KR dipeptide in MHV-JHIV1. The IBV sequence was from Boursnell et at. 
(1987). MHVA: ORF la of MHV. I BVF 1: ORF la of IBV. 
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MHVA GTLDKSGLVKGYNIACTCG-DKLVHCTQFNVPFLI —CSNTPEGKKLPDDWAANIFTGGS-VGH-YTHV 

IBVF1 RATNLLHFKTQYSNCPTCGANNTDE VIEASLPYLLLFATDGPATVDCDEDAVGTVVFVGSTNSGHCYTQA 
MHVA KCKPKYQLYDACNVSKVSEAKGNFTDCLYLK-NLKQTFSSVLTTYYLDDVKCVAYKPDLSQYYCESGKYY 
IBVF1 AGQA-FD—NLAKDRKFGK-KSPYITAMYTRFAFKNE-TS-LPVAKQSKGKSKSVKEDVSNLATSSKASF 
MHVA TKPIIKAQFRT-FEKVEGVYTNFKLVGHDIAEKLNAKLGFDC-NSPFMEYKITEWPTATGDWLASDDLY 

I BVF1 DN—L-TDFEQWYD--SNIYESLK-V-QESPDNFDKYVSFTTKEDSKLPLTLKVR-GIKSVVDFRSKDGF 
MHVA VSRYSGGCVTFGK-PVIWRGHEEASLKSL 178 residues 

IBVF1 IYKLTPDTDENSKAPVYYPVLDAISLKAI 54 residues 

MHVA (2237-4488) PKVVKAKAIACYGAVKWFLLY—CFSWI-KFNT—DNKVIYTTEVASKLTFK-LCCLA 
IBVF1 (1652-3945) PNLERIFNIAKKAIVGSSVVTTQCGKLIGKAATFIADKVGGGVVRNITDSIKGLCGIT 

MHVA -FKNAL-QTFNWSVVSRGF-FLVATV—FLLWFNFLYANVILSDF YLPNIGPLPMFVGQIVAWVK 

IBVF1 RGHFERKMSPQFLKTLMFFLFYFLKASVKSVVASYKTVLCKVVLATLLIVWFVYTSNPVMFTGIRV—LD 
MHVA TTFGVLTICDFY-QVTDLGYRS-SFCNGSMVCELCFSGFDMLDNYESINVVQHVVDRRVS---FDYISLF 
IBVF1 FLFEG-SLCGPYKDYGKDSFDVLRYCADDFICRVCLHDKDSLHLYKHAYSVEQVYKDAASGFIFNWNWLY 
MHVA KLVVELVI—GYSLYTVCFYPLFVLVGMQLLTTWLPEFFMLGTMHWSARLFVFVANMLPAFTLL—RFYI 

I BVF 1 LVFLILF VKPVAGF V11CYCVKYLVLNSTVLQT—GVCF-LDW-FVQTVFSHFNFMGAGFYF 

MHVA VVTAMYKVYCLCRHVMYGCSKPGCLFCYKRNRSVRVKCSTVVGGSLRYYDVMANGGTGFCTKHQWNCLNC 
IBVF1 WL—FYKIYIQVHHILY-CKDVTCEVCKRVARSNRQEVSVVVGGRKQIVHVYTNSGYNFCKRHNWYCRNC 

MHVA NSWKPGNTFITHEAAADLSKELKRPVNPTDSAYYSVIEVKQVGCSMRLFYE RDGQRVYDDV SAS 

IBVF1 DDYGHQNTFMSPEVAGELSEKLKRHVKPTAYAYHVVDEACLVDDFVNLKYKAATPGKDSASSAVKCFSVT 
MHVA LFVDMNGLLHSKVK—GVPETHWWENEADKA—GFLNAAVFYAQSL YRPMLMVEKKLITTANTGLS VS 
IBVF1 DFLKKAVFLKEALKCEQISNDGFIVCNTQSAHALEEAKNAAIYYAQYLCKPILILDQALYEQLVVE-PVS 
MHVA RTMFDLYVYSLLRH-LDVDRKSLTSFVNAAHNSLKEGVQLEQVMDTFVGCARRKCAIDSDVETKSITKSV 

IBVF1 KSVIDK-VCSILSS11SVDTAAL-NYKAGTLRDALLSIT-KDEEAVDMAI 

MHVA MAAVNAGVEVTDESCNNLVPTY-VKSDTIVAADLGVLIQNNAKHVQSNVAKAANVACIWSVDAFNQLSAD 
IBVF1 FCH-NHDVDYTGDGFTNVIPSYGIDTGKLTPRDRGFLINADASIANLRVKNAPPV—VWKFSELIKLSDS 
MHVA -LQHRLRK AC VKTGLKIKLTYNKQEAN VP ILTTPFSL—K-GGAV-FSR VLQWLFV-ANLI C 

IBVF1 CLKY-LISATVKSGVRFFITKSGAKQVIACHTQKLLVEKKAGGIVSGTFKCFKSYFKWLLIFYILFTACC 

Fig. 8— Continued 


EcoRI and BamHI and subcloned into pTZl 8U, yielding 
pTZ(FS aug ). The specific mutations were confirmed by 
DNA sequencing. 

Plasmid pTZ(FS au9 ) was digested with BamH\ and 
Hind\\\ (Hind\\\ site in the polylinker of pTZl8U) and li¬ 
gated to a 626-bp BamY\-Hind\\\ DNA fragment de¬ 


rived from the clone T-12. The resulting plasmid 
pTZ(FrSh) consists of the sequence from nucleotides 
13,147 to 14,164 of gene 1. 

Plasmid pTZ(ORF au0 ) consists of the sequences from 
nucleotide 13,671 to 14,164 of gene 1. An ATG codon 
was introduced at nucleotide 13,678-13,680 by PCR- 
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MHVA —FI VLWALMPTYAVHKSDMQLPL Y-ASFKVIDNGVLRDVSVTDACF ANKFNQFDQWYESTFGLVYYRNS 

IBVF1 SGYYYM-EVSKSFVHPMYDVNSTLHVEGFKVIDKGVLREIVPEDTCFSNKFVNFDAFWGRP-YDNS 

MHVA KACPVVVAVIDQDIGHTLFNVPTKV- -LRYGFHVLH-FI THAFATDRVQCYTPHMQIPYDNF 

IBVF1 RNCPIVTAVIDGD-GTVATGVPGFVSWVMDGVMFIHMTQTERKPWYIPTWFNREIVG-YTQDSIITEGSF 

MHVA YASGCVLSSLCTMLAHADGTPHPYCYTEGVMHNASL-YSSLVPHVRYNLASSNGYIRFPEVVSEGIVRVV 

IBVF1 YTSIALFSARCLYLT-ASNTPQLYCFNGDNDAPGALPFGS11PHRVYFQPNGVRLIVPQQILHTPY—VV 

MHVA RTRSMTYCRVGLCEEAEEGICFNFNSSWVLNNPYYRAMPGTFCGRNAFDLIHQVLGGLVQPIDFFALTAS 

IBVF1 KFVSDSYCRGSVCEYTRPGYCVSLNPQWVLFNDEYTSKPGVFCGSTVRELMFSMVSTFFTGVNPNIYMQL 

MHVA SVAGAILAIIVVLAFYYLIKL KR AFGDYTSVVVINVIVWCINFLMLFVFQVYPTLSCLYACFYFYTTLYF 

IBVF1 ATM-FLILVVVVLIFAMVIKF QgJ VFKAYATTVFITMLVWVINAFILCVHSYNSVLAVILLVLYCYASLVT 

MHVA PSEISVVMHLQWLVM-YGAIMPLWFCITYVAVVVSNHA-LWLFSYCRKIGTDVRSD-—GTFEEMALT 

IBVF1 SRNTVIIMH-CWLVFTFGLIVPTWLACCYLGFIIYMYTPLFLWCYGTTKNTRKLYDGNEFVGNYDLAAKS 

MHVA TFMITKESYCKLKNSVSDVAFNRYLSLYNKYRYFSGKMDTATYREAACSQLAKAMETFNHNMV-MMFSIS 

IBVF1 TFVIRGSEFVKLTNEIGD-KFEAYLSAYARLKYYSGTGSEQDYLQACRAWLAYALDQYRNSGVEIVYTPP 
4 * 

MHVA SLLCTTSFLQSGIVKMVSPTSKVEPCVVSVTYGNMTLNGLWLDDKVYCPRHV I CSSADMTDPDYPNLLCR 

IBVF1 RYSIGVSRLQSGFKKLVSPSSAVEKCIVSVSYRGNNLNGLWLGDTIYCPRHVL GKFSGDQWNDVLNL 

MHVA VTSSDF-CVMSDRMSLTVMSYQMQGSLLVLTVTLQNPNTPKYSFGVVKPGETFTVLAAYNGRPQGAFHVV 

I BVF 1 ANNHEFEVTTQHGVTLNVVSRRLKG AVLILQTAVANAETPKYKFIKANCGDSFTIACAYGGTVVGLYPVT 

* 

MHVA MRSSHTIKGSFLCGSCGSVGYVLTGDSVRFVYMHQLELSTGCHTGTDFSGNFYGPYRDAQVVQLPVQDYT 
IBVF1 MRSNGTIRASFLAGACGSVGFNIEKGVVNFFYMHHLELPNALHTGTDLMGEFYGGYVDEEVAQRVPPDNL 

MHVA QTVNVVAWLYAAILN-RCNWF-VQSDSCSLEEFNVWAMTNGFSSIKADLVLDALASMTGVTVEQVL 

IBVF1 VTNNIVAWLYAA11SVKESSFSLPKWLESTTVSVDDYNKWAGDNGFTPFSTSTAITKLSAITGVDVCKLL 

MHVA AAIKRLHSGFQGKQILGSCVLEDELTPSDVYQQLAGVKLQSKRTRVIKG TCCWILASTFLFCS11SA 

IBVF1 RTIMVKNSQWGGDPILGQYNFEDELTPESVFNQIGGVRLQSSFVRKATSWFWSRCVLACFLFVLCAIVLF 
MHVA FVKWTMFMYVTTHMLGVTLCALCFVIFAMLLIKHKHLYLTMYIMPVLCTLFYTNYLVVGYK-QSFRGLAY 

IBVF1 TAVPLKFYVYAAVILLMAVL-FISFT-VKHVMAYMDTFLLPTLITVIIGVCAEVPFIYNTLISQVV 

MHVA AWLS-YFVPAVDYTYMDEVLYGVVLLVAMVFVTMRSINHDVFSTMFLVGRLVSLVSMWYFGANLEEEVLL 
IBVF1 IFLSQWYDP-VVFDTMVPWMFLPLVLYT-AFKCVQGCYMNSFNTSLLMLYQFVKLGFVIYTSSNTLTAYT 

Fig, 8 —Continued 

mediated mutagenesis in a similar method as for transcribed in vitro with T7 RNA polymerase as 
pTZ(FS aua ). previously described (Soe et a!., 1987). The resulting 

In vitro transcription and translation. Recombinant RNA was translated in the mRNA-dependent rabbit re¬ 
plasmids pTZ(ORF au9 ) and pTZ(FrSh) were linearized by ticulocyte lysate (Promega Biotech) in the presence of 

digestion with restriction enzymes Hin6\\\ or Dra\ and [ 35 S]methionine. Reactions were carried out in a final 
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4- 
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NGFGVQV 

NGYGVAVRLG 


Fig, 8 —Continued 


volume of 25 pi under conditions recommended by the 
manufacturer. The translation products were immuno- 
precipitated by the method of Shin and Morrison (1989) 
and analyzed by electrophoresis on 7.5 to 15% poly¬ 
acrylamide gel. 

Computer analysis of nucleotide and amino acid se¬ 
quences. Sequence data were analyzed on a VAX 
1852 using the GCG sequence analysis software 
package developed by Genetics Computer Group of 
University of Wisconsin. Detailed comparative analy¬ 
ses of coronavirus protein sequences were done by 
programs MULTALIN (Corpet, 1988), OPTAL (Gorba- 
lenya et at., 1989a), DOTHELIX (Leontovich et at., 
1990), and SITE (Koonin et a!., 1990). The programs 
DOTHELIX and SITE are parts of the GENBEE program 
package for biopolymer sequence analysis. 


RESULTS 

Molecular cloning of the gene 1 of the genomic RNA 
of MHV-JHM. To clone the gene 1 region, which repre¬ 
sents more than two thirds of the MHV genome, a syn¬ 
thetic oligonucleotide (oligo 30; 5'-CTGAA I IT GGGG- 
GTTGGG-3') was initially used as a primer for cDNA 
synthesis (Shieh et ai, 1987). The sequence of this 
oligonucleotide was based on the sequence analysis 
of the RNase T1-resistant oligonucleotide No. 30, 
which had previously been mapped to gene 2 (Makino 
et at., 1984). The resulting cDNA clones contained in¬ 
serts ranging from 0.5 to 3 kb in size. These cDNA 
clones detected only the genomic RNA on Northern 
blots of intracellular RNA from MHV-infected cells (data 
not shown). Based on the nested-set structure of MHV 



















576 


LEE ET AL. 


IBV la 

MHV la Q 
0 

L_ 

xlO 3 


□ 

m 

ss 


no alignment 

moderate 

similarity 

high 

similarity 

papain-like 

protease 

m 

m 

m 

m 

X-domain 

3C~1ike 

protease 

cysteine-rich 

domain 

membrane 

domain 


Fig. 9. A schematic presentation of the relationship between the 
ORF la of MHV-JHM and IBV. The two ORF la are shown to scale. 
The designation of regions, for which specific functional predictions 
could be made, and of regions of similarity between the two viruses 
are shown in the bottom of the figure. High similarity, statistical signif¬ 
icance over 10 SD (standard deviation), when aligned by the pro¬ 
gram OPTAL(Gorbalenya era/., 1989a,b); moderate similarity, signifi¬ 
cance of 3 to 10 SD. The alignments in the regions, with predicted 
functions, were significant at the level of at least 5 SD. Regions of 
similarity between the two viruses are joined. Vertical arrows, puta¬ 
tive cleavage sites for 3CL pro . Horizontal arrows, putative papain-like 
proteases (two copies in MHV-JHM, and one copy in IBV). 
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mRNAs (Lai et al., 1981), this result indicated that 
these cDNA clones represent part of gene 1. The 5- 
ends of these DNAs were sequenced, and synthetic 
oligonucleotides complementary to these sequences 
were generated to prime further cDNA synthesis for 
walking toward the 5'-end of gene 1. In this way, over¬ 
lapping DNA clones which encompass about 11 kb at 
the 3'-end of gene 1 were obtained (Fig. 1). cDNA 
clones representing the 5'-terminal 6.2 kb of gene 1 
were derived as described (Shieh et al., 1987; Baker et 
at., 1989). The cDNA clones spanning the gap be¬ 
tween the two cDNA groups were obtained by using 
specific primers representing both the sequences 
downstream and upstream of the gap as primers for 
first-strand and second-strand cDNA synthesis, re¬ 
spectively. The overlap of these cDNA clones was de¬ 
termined by Southern blotting and confirmed by DNA 
sequencing. The complete cloning of JHM gene 1 indi¬ 
cated that the size of gene 1 is approximately 22 kb in 
length (Fig. 1), longer than that of IBV (Boursnell et at., 
1987), and agrees with the previous estimate for the 
gene 1 of the A59 strain of MHV (Pachuk et al., 1989). 

Analysis of the nucleotide sequence and the pre¬ 
dicted amino acid sequence. The complete MHV-JHM 
gene 1 sequence was obtained from the cDNA clones 
as indicated in Fig. 1. This sequence has been depos¬ 
ited with GenBank (Accession No. M55148), and will 


not be duplicated in this publication. The complete se¬ 
quence of gene 1 contains 21,798 nucleotides preced¬ 
ing the UCUAUAC, which is the transcriptional initia¬ 
tion site for gene 2 (Shieh et al., 1989). Analysis of the 
sequence revealed two large, overlapping open read¬ 
ing frames (ORFs), ORF 1 a and ORF 1 b (Fig. 1 a). ORF 
1 a is 4488 amino acids long and has a predicted molec¬ 
ular weight of 499,319, which includes the coding re¬ 
gion for p28 protein at its N-terminus (Soe etal., 1987). 
The hydropathy plot (Kyte and Doolittle, 1982) shows 
that ORF 1 a has several long stretches of hydrophobic 
regions at the carboxy-terminal region, which indicate 
potential membrane-spanning domains (Fig. 2). ORF 
1b, which overlaps ORF la for 75 nucleotides but is 
located at a different reading frame, is 2731 amino 
acids long with a predicted molecular weight of 
308,483. The ORF 1 b sequence is very similar to that 
of MHV-A59 in both nucleotide and predicted amino 
acid sequences (Bredenbeek et al., 1990). Only minor 
substitutions were noted between the two strains (data 
not shown). The ORF 1 b starts with CUG instead of 
AUG. The first potential initiator codon AUG is located 
399 nucleotides downstream of the first amino acid 
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Fig. 10. Alignment of the segments surrounding the putative cata¬ 
lytic His and Cys residues of the coronavirus 3C-like protease with 
the respective segments of other viral 3CL pro . The figure is an ex¬ 
cerpt of the complete alignment generated by program OPTAL. The 
complete amino acid sequences of each viral 3CL pro are indicated, 
but only the sequences around the catalytic residues are shown. 
The numbers of amino acid residues to the known or postulated 
termini of the respective viral 3CL pro and between the aligned seg¬ 
ments are indicated. For MHV 3CL pro , the postulated N-terminus is 
at amino acid residue 3350 (Fig. 8 and Table 1). Residues identical or 
similar to those in the coronavirus sequences are highlighted by 
boldface. The arrow shows the Gly to Tyr substitution in the putative 
substrate-binding sites of the coronavirus proteases. Asterisks, (pu¬ 
tative) catalytic residues. Abbreviations: PV1, poliovirus type 1, Ma¬ 
honey strain; HRV2, human rhinovirus type 2; EMCV, encephalo- 
myocarditis virus; FMDV, foot-and-mouth disease virus type A10; 
HAV, hepatitis A virus; CPMV, cowpea mosaic virus; TBRV, tomato 
black ring virus; BWYV, beet western yellows virus; SBMV, southern 
bean mosaic virus; TEV, tobacco etch virus. For sources of the se¬ 
quences, see Gorbalenya era/. (1989b), except BWYV (Veidt et al., 
1988) and SBMV (Wu et al., 1987). 
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TABLE 1 


Summary of the Predicted Cleavage Sites for the 3C-like Proteases in Coronavirus Polyproteins 



IBV 


Putative protein 
next to the 
C-terminus of 
the cleavage site 


MHV 

aa sequence 

aa 

position 

Size 
(# of aa) 

aa 

position 

Size 

(# of aa) 

aa sequence 




ORF la 




* 







VIKFQGVFKA 

2583 

196 

MP1 

3160 

190 

LIKLKRAFGD?? 

VSRLQSGFKK 

2779 

307 

3CL pro 

3350 

303 

TSFLQSGIVK 

GVRLQSSFVR 

3086 

293 

MP2 

3652 

288 

GVKLQSKRTR 

IATVQAKLSD 

3379 

83 

? 

3941 

89 

VSQIQSRLTD 

STVLQSVTQE 

3462 

322 

? 

4030 

307 

NTVLQALQSE 

NVVVQSKGHE 

3784 

144 

GFL 

4337 

137 

TVRLQAGTAT 

KSSVQSVAGA 

3928 

931 

POL 

4474 

918 

GSQFQSKDTN 




ORF 1b 




PTTLQSCGVC 

891 

601 

HEL 

939 

600 

SAVMQSVGAC 

ETSLQGTGLF 

1492 

520 

? 

1539 

519 

NPRLQCTTNL? 

FSALQSIDNI 

2012 

338 

? 

2058 

374 

FTRLQSLENV 

YPQLQSAWTC 

2350 

302 

? 

2432 

299 

YPRLQAAADW 


Note. In the aa position columns, the amino acid positions of the respective Q residues are indicated. The arrows show the predicted cleavage 
sites, Abbreviations: MP1, MP2, putative membrane proteins flanking the 3CL pro at the N- and C-sides, respectively. POL: polymerase motif. 
HEL: helicase motif. GFL: growth factor-like domain. The data on IBV was obtained from Gorbalenya etal. (1989b). The sequence analysis was 
performed using the computer program as described under Materials and Methods. 


codon in ORF 1b. Nevertheless, the codon preference 
plot suggests that the 399 nucleotides upstream of the 
first AUG are most likely translated together with the 
downstream sequences using the same reading frame 
(Fig. 3). In light of the corresponding sequences of IBV 
and MFIV-A59 (Boursnell et al., 1987; Bredenbeek et 
a/., 1990), this result suggests that this region could be 
translated via a ribosomal frameshifting mechanism 
(Brierley et al., 1989). 

Comparison of tertiary structure of RNA in the frame- 
shift regions. It has been proposed that the nucleotide 
sequences in the overlapping regions between ORF 1 a 
and ORF 1 b in IBV and MHV-A59 RNAs are able to fold 
into a pseudoknot tertiary structure, which is essential 
for efficient frameshifting and, thus, expression of the 
downstream ORF 1 b (Brierley etal., 1989; Bredenbeek 
et al., 1990). Comparison of the primary sequence re¬ 
vealed that the corresponding region of MFIV-JFIM 
contains a "slippery” sequence, UUUAAAC, similar to 
that of IBV (Fig. 4A). The possible folding of RNA in this 
region into a pseudoknot tertiary structure is similar 
among IBV, MHV-A59, and MFIV-JFIM (Fig. 4B). It is 
interesting to note that the nucleotide changes be¬ 
tween MHV-JHM and IBV in either the stem or loop 
regions are compensated by mutations at the comple¬ 
mentary positions (Fig. 4B). This suggests the signifi¬ 
cance of the putative tertiary structure in ribosomal 


frameshifting. Only two nucleotides differ between 
MFIV-JFIM and MFIV-A59 in this region; they are lo¬ 
cated at the regions immediately upstream and down¬ 
stream of the UUUAAAC sequence. 

Ribosomal frameshifting in vitro. To confirm that the 
ORF la and 1b of MFIV-JFIM could be translated into 
one polypeptide by ribosomal frameshifting, we cloned 
the region spanning from nucleotide 13,147 to 14,164 
of gene 1 into an expression vector under the control of 
the T7 promoter for in vitro translation studies. Be¬ 
cause of the lack of a translational initiation codon, an 
ATG codon was introduced by PCR-mediated muta¬ 
genesis at nucleotide 13,154-13,156. If the transla¬ 
tion of this transcript terminates at the UAAstop codon 
in ORF la, a 19-kDa protein will be produced. Flow- 
ever, if the -1 translational frameshift occurs, a 37-kDa 
protein will be synthesized. As shown in Fig. 5, the in 
vitro translation of this RNA yielded both proteins (lane 
2). The 37-kDa protein was heterogeneous; the 
smaller proteins may represent aberrant translational 
initiation or specific processing of the translation prod¬ 
ucts. The addition of protease inhibitors in the rabbit 
reticulocyte lysates did not alter this translation pattern 
(data not shown). The antiserum prepared against the 
amino acid sequence just upstream of the frameshift 
(unpublished) precipitated both proteins (lane 5). Sur¬ 
prisingly, the major products precipitated by this anti- 
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Fig. 11 . Alignment of the putative coronavirus papain-like proteases. The numbers of the first and last residues of the aligned segments are 
indicated in parentheses. Both of the two papain-like proteases of MHV are shown. Residues conserved in all the three sequences (identical or 
similar) are highlighted by boldface. Asterisks, putative catalytic residues. 


serum migrated faster than the respective primary 
translation products, suggesting that protein process¬ 
ing had occurred. None of the proteins was immuno- 
precipitated by the preimmune serum. As controls, the 
transcripts containing either the 5'- or the 3'-halves 
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Fig. 12. Alignment of the segments around the putative catalytic 
residues of coronavirus papain-like proteases with the respective 
segments of papain-like proteases of cellular origin. The designa¬ 
tions are as in Fig. 10. Abbreviations: MCP, mouse cysteine pro¬ 
tease; catH, rat cathepsin H; DCP, Dyctiostelium cysteine protease; 
catB, rat cathepsin B; catL, rat cathepsin L; CDP, chicken calcium- 
dependent protease. The sources of the sequences: Portnoy et al. 
(1986) (MCP, aleurain, actinidin, papain, DVP, catB); Dufour et at. 
(1988) (catL); Ohno et at. (1984) (CDP). 


[pTZ(ORF aU9 )] of the ORF did not yield the 37-kD pro¬ 
tein. As predicted, only the products of the 5'-half were 
precipitated by this antibody (Fig. 5B, lane 4). These 
results are in agreement with the results obtained with 
IBV (Brierly et al., 1987) and MFIV-A59 (Bredenbeek et 
a/., 1990). 

Analysis of sequence homology among MHV-JHM, 
MHV-A59, and IBV. The comparison of nucleotide and 
predicted amino acid sequences between MHV-JHM 
and IBV revealed considerable similarity between the 
two. The dot matrix comparison of the amino acid se¬ 
quences shows that ORF 1b is very similar between 
MHV and IBV (Fig. 6). Overall, there are 47.7% similar¬ 
ity at nucleotide level and 52.8% at amino acid level. 
Similar to the ORF 1b of IBV, the MHV ORF contains 
the polymerase and helicase motifs at the correspond¬ 
ing positions (Gorbalenya et al., 1989b) (data not 
shown). The putative zinc-binding domain is also 
largely conserved between the two viruses. On the 
other hand, two of the residues implicated in metal 
binding for IBV (Gorbalenya et al., 1989b) are replaced 
in MHV, suggesting that the specific structures of the 
putative “fingers" may differ (Fig. 7). The ORF 1b of 
MHV-JHM and MHV-A59 are also very similar (95.9% 
at nucleotide level, and 94.9% at amino acid level) 
(data not shown). 
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In contrast, the ORF 1 a is more diverged (Fig. 8). The 
MFIV ORF 1 a is longerthan the corresponding IBV ORF 
by 537 amino acids. The C-terminal half of the ORF 1 a 
is relatively conserved between MFIV-JFIM and IBV, 
while the N-terminal half is very diverged (Fig. 6). The 
alignment of amino acids in ORF la of MFIV-JHM and 
IBV showed that there are four possible stretches of 
moderate homology which are separated by highly di¬ 
verged sequences (Fig. 8). 

Analysis of the functional domains of ORF la. Al¬ 
though ORF la is highly diverged between MFIV-JHM 
and IBV, common functional domains could be identi¬ 
fied in this ORF of both viruses by detailed amino acid 
sequence analysis (see Materials and Methods) (Fig. 
9). Two hydrophobic, potentially membrane-anchoring 
regions are present in the C-terminal half. There are 
three cysteine-rich domains, one of which contains a 
segment distantly resembling growth factors and their 
receptors (Gorbalenya etal., 1989b). In both coronavi- 
ruses, homologous domains of about 300 residues 
each have been identified to be related to the putative 
3C-like proteases (3CL pra ) of picorna-, como-, nepo-, 
poty-, sobemo- and luteoviruses (Gorbalenya et a/., 
1989b). The sequences of the putative coronavirus 
3C-like proteases possess certain unusual features 
distinct from that of other viral 3C-like proteases (Fig. 
10, and see Discussion). The search for sequences 
resembling the cleavage sites for the 3C-like proteases 
revealed six conserved putative target sites for the 
MHV and IBV 3C-like proteases (Table 1) (see Discus¬ 
sion). These potential cleavage sites are localized in 
the ORF 1b and the C-terminal half of the ORF la. 
Interestingly, the N-terminal one of these cleavage 
sites marks the N-end of the putative 3C-like protease 
itself. Finally, there is a region of moderate conserva¬ 
tion between MHV and IBV, which contains short seg¬ 
ments resembling those around the catalytic Cys and 
His residues of papain-like proteases (Fig. 11). This re¬ 
gion is duplicated in the MHV genome, but not in IBV, 
at an upstream site in the ORF la. This upstream pa¬ 
pain-like cysteine protease has been identified as the 
one responsible for the cleavage of p28 from the N-ter- 
minus of the gene 1 protein (Baker et at., 1989). A do¬ 
main of considerable conservation between MHV and 
IBV (X domain in Fig. 9) has been found next to the 
putative coronavirus papain-like proteases. Interest¬ 
ingly, a homologous conservative domain also flanks 
the putative thiol proteases of alpha- and rubiviruses 
(A. E. Gorbalenya, unpublished observations). 

DISCUSSION 

The complete sequence of gene 1 of MHV pre¬ 
sented in this paper shows that this gene is probably 


the largest known viral gene among RNA viruses. Evi¬ 
dence was presented suggesting that the two ORFs in 
this gene may be translated into a large polyprotein. 
This interpretation is consistent with the lack of the 
transcriptional initiation signal (UCUAAAC) in the entire 
gene 1 sequence except at the extreme 5'-end. Al¬ 
though the putative "slippery" sequence (UUUAAAC) 
between the ORF la and 1b (Brierly et al., 1989) is 
similar to the transcriptional initiation signal, no major 
subgenomic mRNAs have been detected within this 
gene. Thus, this gene most likely encodes a single 
polyprotein of at least 800 kDa. The total size of the 
RNA genome of MHV is approximately 31 kb, which is 
considerably larger than any of the other known viral 
RNA. The evolution of the coronavirus RNA genome 
into such a large RNA may have reflected the unusual 
mechanism of coronavirus RNA synthesis. The com¬ 
plexity of the discontinuous mode of coronavirus RNA 
synthesis (Lai, 1988) suggests that the coronavirus 
RNA polymerase needs a variety of different enzymatic 
activities. 

The amino acid sequence of gene 1 of MHV shows 
considerable similarity to that of IBV. The ORF 1b is 
particularly conserved. Its degree of conservation be¬ 
tween MHV and IBV is higher than that for any of the 
other genes in the coronavirus genomes. The ORF 1 b 
contains the polymerase, helicase, and metal-binding 
motifs (Gorbalenya et a/., 1989b), suggesting that this 
region may be directly involved in RNA synthesis. 
These structural features are conserved between 
these viruses. The proposed pseudoknot structure 
which is important for the ribosomal frameshifting for 
cotranslation of ORF la and ORF 1b (Brierley et al., 
1989) is also highly conserved. This fact has previously 
been recognized in the partial sequence of gene 1 of 
MHV-A59 (Bredenbeek etal., 1990). The sequence dif¬ 
ferences between MHV-A59 and MHV-JHM within this 
junction region are located at the nucleotides which do 
not affect the putative pseudoknot structure. In con¬ 
trast, ORF la is much more diverged. It is nearly 2 kb 
longer than the ORF la of IBV, and contains several 
stretches of sequence which are not present in the IBV 
genome. These nonhomologous stretches of se¬ 
quence are interspersed between the conserved re¬ 
gions. Furthermore, a papain-like protease domain, 
which is present once in the IBV genome, is duplicated 
in the 5'-half of the ORF la of MHV. The N-terminal 
sequence including p28, which is cleaved by the pa- 
pain-like protease of MHV (Baker et al., 1989), is also 
highly diverged between MHV and IBV. Thus, it ap¬ 
pears that the 5'-end of ORF 1 a has undergone consid¬ 
erable sequence rearrangement and possibly recombi¬ 
nation, while the remaining sequences in gene 1 are 
almost colinear between MHV and IBV. 
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In contrast to the ORF 1 b which contains sequence 
motifs related to the synthesis of RNA, the ORF la 
contains several domains suggestive of other func¬ 
tions. First of all, there are two long stretches of hydro- 
phobic domains, which are conserved between IBV 
and MHV. The presence of these domains suggests 
that the gene 1 products may be anchored to the 
membrane. This possibility is consistent with the find¬ 
ing that MFIV RNA synthesis occurs on the membrane 
fractions in the infected cells (Brayton et ai, 1982). 
Second, there are three cysteine-rich regions, which 
are also homologous between MFIV and IBV. The 
function of the Cys-rich domains is still not clear. How- 
ever, it has been noted with IBV that the C-terminal 
Cys-rich domain is related to that of the growth factors 
and their receptors (Gorbalenya et al., 1989b). Third, 
there is a 3C-like protease domain (3CL pro ) in the 3'-half 
of ORF 1 a, which is also conserved in IBV. The putative 
catalytic His and Cys residues previously predicted in 
IBV have also been observed in MHV (Fig. 10). How¬ 
ever, the putative coronavirus proteases remain 
unique in that they do not contain a conserved Asp(Glu) 
residue that could serve as the third catalytic residue 
as suggested for the other 3C-like proteases (Gorba¬ 
lenya et al., 1989b). Furthermore, the unusual substi¬ 
tution of Tyr for Gly in the putative substrate-binding 
region, described previously in IBV, is also observed in 
the putative MHV 3CL pr0 (Fig. 10). The potential cleav¬ 
age sites for this 3C-like protease have been identified 
to be mainly in ORF 1 b and the C-terminus of ORF 1 a 
(Gorbalenya era/., 1989b). These sites (QS) are either 
conserved or converted to QA in MHV (Table 1). The 
potential cleavage at Q/S and Q/A sites by picornavirus 
3CL pro has been demonstrated previously (Parks and 
Palmenberg, 1987). Two QG dipeptides proposed to 
be cleaved in IBV were substituted in MHV by QC in 
one case, and by KR dipeptide in another (Table 1). 
Substitution of a C (unlike several other residues) for G 
in a cleavage site for encephalomyocarditis virus pro¬ 
tease did not abolish processing in an in vitro system 
(Parks et ai, 1989). Dibasic dipeptides are cleaved in 
the polyproteins of flaviviruses (Strauss and Strauss, 

1988). Thus, these postulated cleavage sites are po¬ 
tentially cleavable by MHV 3CL pr0 despite the diver¬ 
gence. These cleavages could separate different func¬ 
tional domains of the gene 1 polyprotein into distinct 
protein products. Whether these sites are indeed 
cleaved in MHV-infected cells remains to be studied. 
Fourthly, the N-terminal portion, which is the most di¬ 
verged region, contains a papain-like protease domain 
as pointed out previously for IBV (Gorbalenya et al., 
1989b). The papain protease domain is duplicated in 
the MHV ORF 1 a (Fig. 11) and is homologous with the 
known proteases (Fig. 12). This protease is probably 


involved in the cleavage of the N-terminus of the gene 
1 polyprotein (Baker et ai, 1989), which has been dem¬ 
onstrated in MHV-infected cells (Denison and Perlman, 
1987). Site-specific mutagenesis studies demon¬ 
strated that this protease has Cys and His at its active 
site (unpublished observation). 

The possible presence of the protease domains sug¬ 
gests that the gene 1 polyprotein is processed into 
many proteins. It has been shown that there are at 
least five to six complementation groups involving 
MHV RNA synthesis, five of which have been mapped 
within gene 1 (Leibowitzefa/., 1982; Baric etal., 1990). 
These proteins conceivably participate in various 
aspects of MHV RNA synthesis. None of the proteins 
have been detected so far. 
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